home *** CD-ROM | disk | FTP | other *** search
- Path: keats.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c
- Subject: Re: I/O Question
- Date: 21 Jan 1996 07:45:11 -0800
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4dtn27INNn43@keats.ugrad.cs.ubc.ca>
- References: <4dp1nm$mu6@taco.cc.ncsu.edu>
- NNTP-Posting-Host: keats.ugrad.cs.ubc.ca
-
- In article <4dp1nm$mu6@taco.cc.ncsu.edu>, Alex David Groce <adgroce> wrote:
- >I'm relatively inexperienced in C, was started with C++ here at NC State.
- >Anyway, working on a project where speed really matters, I'm reading
- >from a file containing 100001 integers in text format. I was originally
- >using C++'s iostreams, but discovered the using stdio.h speeded things up
- >by an incredible factor. I guess the C++ routines are so general that
- >they're slllooooooooww as molasses. Anyway, I'm using scanf now, as in
- >
- >while (scanf("%i", &list[count++]) != EOF); // Read elements.
- >count--; // Avoid good old OBOB, the Off-By-One-Bug.
- >
- >Would it be worth my time to learn buffered I/O techniques, read or
- >getchar or such, to input this data? Every little bit of time
- >counts... Or is there a good example somewhere of a fast way to do
- >this?
-
- Try this!
-
- %{
-
- /* Lex lexical analyzer */
-
- #include <stdio.h>
- #define MAXSIZE 16384
-
- int number;
-
- %}
-
- ws [\n\t ]
- digit [0-9]
- integer "-"?{digit}+
-
- %%
-
- {ws} { /* skip */ }
- {integer} { number = atoi(yytext); return 1; }
- . { /* skip illegal char */ }
- %%
-
- main()
-
- {
- int list[MAXSIZE];
- int count = 0;
-
- /* open a file, assign it to FILE *yyin */
- /* call yylex() to get numbers one by one */
- /* reads standard input by default */
-
- while(yylex() && count < MAXSIZE) {
- /* 'number' contains number that was read from yyin */
- list[count++] = number;
- }
- }
-
-
-
- Pump the above through the "lex" (or better, the GNU "flex"), and the compile
- the generated program "lex.yy.c" and link with the -ll (or -lfl if you used
- flex) library.
-
- The above will make a scanner as fast as the one done with scanf(). I built the
- program with flex(), and it turned out to be just a bit slower than scanf(),
- but with flex's -f option to build a fast scanner at the expense of memory,
- it works almost twice as fast.
-
- My test was an 8MB file containing the same number repeated over and over
- again, created by just doing a "yes 12345678 > file" and letting it run for a
- few seconds.
-
- Here, a.out is the scanf() version, and b.out is the flex -f generated version:
-
- turing:~/src/test$ time a.out < testfile
- 9.26user 0.76system 0:12.43elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
- 0inputs+0outputs (20major+44minor)pagefaults 0swaps
- turing:~/src/test$ time b.out < testfile
- 6.70user 0.54system 0:09.58elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k
- 0inputs+0outputs (22major+39minor)pagefaults 0swaps
- turing:~/src/test$ uname -a
- Linux turing 1.3.42 #22 Sat Jan 13 12:22:51 PST 1996 i586
-
- I did one "dry run" before that just to make sure that one program wasn't
- getting an unfair advantage due to caching.
-
- The regular expression patterns that define the lexical analyzer are compiled
- into a static table by lex, whereas scanf() has to parse its format string each
- time you call it, which is wasted overhead if the format string doesn't change
- from invocation to invocation.
-
- I have written many scanners using lex, so I can write dinky things like the
- above off the top of my head. There are better ways to use lex, though. You can
- make it more efficient by being clever with how you define the pattern matches.
-
- In my experience, a lex-generated scanner is typically at least ten times
- faster than programs that use scanf() as soon as your scanning starts to become
- non-trivial. Hand-written code based on the standard IO functions becomes hard
- to debug and maintain compared to a lex spec (though the latter has its own
- unique challenges). It's very good for doing things like: writing parsers for
- configuration files, importing databases from text files, scanning logs,
- writing lexical analyzers for compilers, text filters and the like.
- One thing is for sure, though: lex-generated code is bigger than a library
- call!
- --
-
-